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C^ , Abstract 

The problem of approximating a sampled function using sums of a fixed number of complex exponentials is 
considered. We use alternating projections between fixed rank matrices and Hankel matrices to obtain such an ap- 
proximation. Convergence, convergence rates and error estimates for this technique are proven, and fast algorithms 
^ ■ are developed. We compare the numerical results obtain with the MUSIC and ESPRIT methods. 

OO 
(N 
O ■ 1 Introduction 

(N 

f~^ ■ The present paper is devoted to the problem of approximating a sampled function with a sum of a given number of 

^^ I complex exponentials. Our approach is based on the fact that if such a sum is used as a generating function for a 

Hankel matrix, then that Hankel matrix will (generically) be of rank k. Using this fact, we develop a method for the 
detection of complex frequencies from a signal by alternating projections: we project the corresponding Hankel matrix 
onto the class of symmetric rank fc-matrices, project the projection on the class of Hankel matrices, and so on. By a 
complex frequency , we refer to the coefficient (^ e C in an exponential of the form f M' e""*. 

There are several alternative techniques for the estimation of (complex) frequencies from a signal. Two of the most 
C^ ' commonly used ones are multiple signal classification (MUSIC) |24 7J and estimation using rotational invariance 

(ESPRIT) 12311 . The MUSIC method is a generalization of the Pisarenko method ||2T|| . Recently, complex frequency 
estimation has been used in the construction of close to optimal quadratures, for instance for bandlimited functions 
IB]. This work is related to the work of Adamjan, Arov and Krein [IJ, and the algorithms described in |6j have been 
investigated in more detail in IS). 

The technique of alternating projections is generally described as follows: Given two manifolds M.i^M.2 C JC 
(where /C is some Hilbert space) and some point xq £ JC, find a point x e A^i n M.2 that is close to xq, by projecting 
alternately onto M.i and A^2, respectively. It was proven by von Neumann ifTSl that if TMi and 7M2 are affine linear 
subspaces, then the sequence of alternating projections 

7ri(a::o),7r2(7ri(a;o)),7ri(7r2(7ri(a;o))), . . . 

converges to an optimal solution x <E M.iC\ M.2, i.e., one that minimizes ||a; — a;o||. 



X 



The extension to the case where A^i and A^2 are convex sets has been extensively treated for a number of appU- 
cations, cf. plfSlI and the references therein. Another generahzation was given in fl3\, where the convergence of the 
alternating projection scheme was proven for the case where, loosely speaking, the tangent spaces of A^i and A^2 
together span /C. Note that only convergence to some point in A^i n A^2 can be proven, but that this point is not 
necessarily the point in A^i n A^2 that is closest to xq. 

Moreover, neither of the cases above apply to the case which we are interested in, as the space of rank fc-matrices 
is not convex, and the spanning condition is typically far from satisfied. In ||2l, convergence of alternating projections 
between two manifolds is proved under much milder conditions than the ones given in IfTSl . In this paper we prove 
that these conditions are generically satisfied in our case; complex symmetric rank fc-matrices and Hankel matrices. 
Moreover, through the framework of ||2l we can provide estimates for how far away from the initial (sampled) function 
the approximating fc-term complex exponential sum will be. 

The idea of using alternating projections for frequency estimation has appeared in a number of different settings. 
The method of alternating projection is commonly referred to as Cadzows method in the signal processing community. 
In ||8l, Zangwill's global convergence theorem is used to prove convergence for algorithms with alternately projects 
onto (possibly more than two) manifolds. However, Zangwill's theorem only provides the existence of a convergent 
subsequence, and the results in IS) do not give any information on whether or not the point of convergence is close 
to the original one, cf. ||9). In the paper IH, several applications are mentioned; one of them is the projection 
between finite rank matrices and Toeplitz matrices. Toeplitz matrices appear in the estimation of exponentials by using 
infinite measurement (or expected value) of autocorrelation matrices. For an (infinitely dense) sampling of a function 
consisting of k complex exponentials, it is possible to form a Toeplitz matrix from which the fc frequencies can be 
recovered. It is worth mentioning that for a finite sampling of a function with k complex frequencies, the resulting 
autocorreletion function will not have a Toeplitz structure, and hence the frequencies can not be exactly recovered with 
this method, even in the absence of noise. A survey of problems of approximations using a combination of structured 
matrices and low-rank matrices is given in ifTTl . Alternating projections is mentioned as one of the numerical methods 
for finding approximate solutions. 

The use of alternating projections (Cadzow's method) between Hankel and low-rank matrices has appeared several 
times in the signal processing literature lfT4l ITSl l22l . The approaches differ in the way the complex frequencies are 
estimated, once the alternative projection method has converged. 

In this paper we develop fast methods for the projection steps. We make use of the fact that multiplication by a 
Hankel matrix, as well as the projection of low rank matrices onto Hankel matrices, can be computed in a fast manner 
by the use of FFT. For the projection onto low-rank representations, we will use a customized complex symmetric 
version of the Lanzcos algorithm. 

Finally, we consider the approximation by exponentials for a particular class of weighted spaces - including 
(approximate) Gaussian weights. Let w be a nonnegative function on R with support [—1,1] and let 

UJ ~ w * w. (1) 

Let L'^{uj) denote the set of functions for which ||/||^ = J_^ \f{t)\'^uj{t) dt < oo. Given / G L'^{oj) and fc e N, 
we are interested in computationally efficient methods for finding the best (or close to best) approximation of / by 
functions of the form J2j=i Cjc'"^*. In this paper we develop a theory for finite sequences rather than functions on a 
continuum. Using techniques similar to those developed in ID, it seems to be possible to develop a similar technique 
for the approximation of functions on a continuum by a finite number of complex exponentials. 

2 Preliminaries 



In section lZTI we give the necessary tools for projection onto matrices of a certain rank and set up the spaces we will 
work with. In section IZ21 we describe how to go from a Hankel matrix to its symbol and back, in these spaces. 



2.1 Takagi factorization and the Eckart- Young theorem 

We use the notation Mj\f _jv to denote the Hilbert space of M x A^ matrices with complex entries, equipped with the 
Frobenius norm, given by 

M N 

pf =^^iA(j,or (2) 

Complex symmetric matrices satisfy the symmetry condition A = A^ , which is different from the usual (Her- 
mitian) self-adjointness condition A ^ A*. Similarly to real symmetric matrices, which are always diagonalizable, 
complex symmetric matrices can be decomposed as 

N 
m=l 

where the vectors {um}m are mutually orthogonal. (As usual, elements of C^ are identified by column matrices, 
and u* is the adjoint, i.e. the transpose of the complex conjugate of u.) This decomposition of A is called a Takagi 
factorization. Note that in contrast to the Hermitian case, the numbers Sm are nonnegative. Moreover, the vectors Um 
satisfy the relation 

In lfT2ll . the vectors u,„ are referred to as con-eigenvectors and the positive numbers s,„ are referred to as con- 
eigenvalues. However, the con-eigenvectors are simply singular vectors (obtained from the Singular Value Decompo- 
sition), and the con-eigenvalues are the singular values. This is seen by noting that 

The converse is not true, since it is easily seen that e.g. m™ fails to be a con-eigenvector but is still a singular vector. 
However, in the case where the s^'s are distinct and (um)m=i is any basis of singular vectors, then one can choose 
0m G [0, 2tt), such that {e'^^'"u,n)m=i ^^^ con-eigenvectors. For the purposes of this paper, we are only interested in 
the zeroes of the corresponding polynomials, and hence the fl^'s have no importance, but it will be computationally 
more convenient to extract the con-eigenvectors, and we have thus chosen to use this terminology. 

We recall the Eckart- Young theorem (see e.g. lfT2l p 205], ifTOl ). (usually stated using the singular vectors): 

Theorem 1 Let A G Mjv.at be a complex symmetric matrix with distinct con-eigenvalues. Given a positive integer 
k < N, the best rank k approximation of A (in Mtv.at) is given by 



/ ^ ^m'^ni^rn^ \^) 



where Sm and Um are the (decreasingly ordered) con-eigenvalues and con-eigenvectors of A, respectively. 

The above theorem can clearly be used to project a given matrix onto the closest rank k matrix (with respect to the 
Frobenius norm). We will also make use of approximations in weighted spaces. Given a positive weight w G R^, we 
denote by M)^ ^ the Hilbert space of matrices with the weighted Frobenius norm, given by 

N 

m\l= E Mj)\Mj,k)f^{k) = \\dmg{V^)Admg{V^)f. 

Theorem 2 Let A G M]^ jy, andlet s,„ andq,n. denote con-eigenvalues and con-eigenvectors of B = 6\'Ag{y/w) A diag(v^). 
Then the best rank k-approximation of A (in Mi^j^ ^) is given by 



k 
m— 1 



E 



where u„(0 = q„-,il)/ y/W)' i<l<N. 



Proof: By definition 



771—1 



diag(Vw)M- ^ s. 



m'^rrt^rn 



diag(-v/w) 



S - ^ Sm {dia.g{^/w)um) {diag{ 

7n—l 



W)Ur, 



which according to Theorem[T]is minimized by choosing Uj„ ~ (diag(y'w))^^(7„i and by choosing s,„ as the con- 
eigenvalues of B. 

There are different ways to compute Takagi factorizations. We indicate one method, the first step of which is the 
following proposition. 



Proposition 1 Let A and B be real symmetric {N x N)-matrices and let 

W -- 



A -B 
-B -A 

Let di > d2 > ■ . ■ > d2N be the eigenvalues of W . Then dj+d2N+i-j — Ofarj — 1,2,..., 2N, and an orthononnal 
basis of eigenvectors can be chosen as 






X2 
Y2 



X2n 
Y2n 



where Xj, Yj E R^, X2n+i-j = ^Yj and Y2n+i~j ~ Xj for j = 1,2, . . . , N. 
The proof is given as an exercise in lfT2l . 

2.2 Hankel matrices 

A Hankel matrix A has constant entries on the anti-diagonals, i.e. it satisfies the relation 

A{jJ)^A{j',l'), ifj + l = j' + l'. 

Every Hankel matrix A G Mjvjv can thus be generated from some vector / = {Ij)]=2 ^y 

A{3,l)=Hf{3,l) = f{j+l), l<j,l<N. 

An orthonormal basis for the Hankel matrices in Mjv.jv is given by 

f , ^ , if ?■ + / = m; 

0, otherwise. 



(5) 



(6) 



for 2 < m < 2N, where the normalization factor originates from the number of elements along anti-diagonal m. 
When considering Hankel matrices in weighted spaces we need to use proper normalization; the basis elements should 
be normalized with respect to the induced (matrix) weights along the anti-diagonal. We associate the weights 



;(m) = ^ w{j)w{l), 2 < m < 2N, 

j-\-l—m 



(7) 



to w, and note that this can be written as a discrete convolution uj = w * w, where w denotes the zero padded version 
of w. A basis for Hankel matrices in the weighted space M^ jy is then given by 



I 0, otherwise. 



for 2 < m < 2N . Note that in the case w = 1 we get the "triangle weight" which appeared in (|6]). We let ^27V-i ^"6 
the space of complex sequences / = [fj )'j=2 ' equipped with the norm defined by 

j 

The mapping H (given by ^) will in the sequel be considered as a mapping from i-2N-i ^'^ 
a unitary map (isometric isomorphism), whose adjoint is the weighted averaging operator 

1 



^N,N, 



It is 



H*A{m) 



uj{m) 



J2 wU)Aij,l)w{l), 



(8) 



and H*H = I. The following proposition is now immediate. 

Proposition 2 Let w G M;^ be given and let uj be the associated weight defined by ((T]). Let f = (/j)?=2 '^^^ '^^ '^ ^^ 
any set ofHankel matrices. Then the problem 

a.i-gnim\\Hf - H\\^ 
Hes 



is equivalent to the problem 

The solutions are related by Hg = H. 



argminW f - g\\i^ 



3 Properties of fixed-rank and Hankel matrices 

The key observation behind the algorithms of this paper is that a rank k Hankel operator generically has a symbol 
which is a sum of k exponentials. However, this is not always true, and neither is the projection onto rank k matrices, 
given by Theorem [U well defined at all points. In this section we show that the exceptional set is very small. We 
introduce the concept of a thin set, and show that the exceptional points are confined to thin sets. 

We denote by Hn the set ofHankel matrices in Mjvjv, and 7?.Ar.fc will denote the set of matrices in Matjv of rank 
at most k. 



3.1 Manifold structure 

In this entire section, we will work with subsets of Mtv.w, consisting of matrices whose entries are ordered from 1 
to N. "H is a linear subspace of Matat and, hence, a differentiable manifold of (real) dimension 2{2N — 1). By 
identifying C with M."^ in the obvious way, a simple modification of H (defined in (|5]l) provides a natural chart. The 
structure ofTlN,k is more complicated; we will show that it is a manifold of (real) dimension 2{2Nk — fc^) outside a 
small exceptional set. Suppose A G Ti-M.k^ ^nd use the singular value decomposition of A to find a a & (M^)'' and 
Ua, Va such that U^Ua ~ V^Va = Ik (where Ik is the k x k identity matrix and Ua, Va are A^ x fc-matrices) and 

/ TA,! • • • \ 



A = Va 







fA,2 



V 




aA,k J 



U*A = VAlaU*A. 



A typical matrix in IZN,k satisfies 



CTAA > a-A.2 > ■■■ > (JA.k > 0, 



(9) 



(10) 



and, if this is not the case, an arbitrary small numerical perturbation will yield distinct singular values. The subset of 
TlN,k, consisting of iV x TV-matrices satisfying (fTol i. will be denoted 71% j,, where d stands for "distinct". If Al is a 
manifold and E is a set, contained in the union of finitely many manifolds of dimension lower than the dimension of 
A4, we will say that E is thin relatively to M. 



Propositions 7?.^ j, C M^vtv is a manifold of (real) dimension 2{2Nk — fc^). Moreover, TZjy^k = ^w fe '^'^'^ 
Ti-N,k \ Ti-% k '■* ^'^'" relatively to TZN,k- 

Proof: We start by remarking that the set U{N, k) of complex N x fc-matrices U satisfying U*U = /fc is a real 
manifold of dimension 2Nk — k^. Namely, the columns [/(-, 1), [/(-, 2), ...,[/(-, fc) of such a matrix can be identified 
with points on 5*^^^^, and thus U{N, k) can be identified with the subset of elements U G [S"^^^^)^, satisfying the 
functionally independent equations 

Re [/*(., j)[/(.,0 - Im (7*(-,j)C/(-,0 - 0, 1 < / < j < fc. 

The number of these equations is 

2 + 4 + ... + 2(fc-l) = fc2-fc, 

and thus U{N, k) is a manifold of dimension 

k(2N - 1) - (fc2 - fc) = 2Nk - fc2. 

Now let A G TZJ^ f.. Then there are matrices Ua and Va in U{N, k) and a vector cTyi G R'j^ with <7a,i > crA:2 > 
. . . > (JA,k^ such that 

k 

A = Va diag(aA) f/1 = 5^aA,,VA(-,j)C/l(-,j). 

In this representation, the numbers <ta .j are uniquely determined by A, and so are the products Va ( • , j ) U\ ( j' ) , but the 
vectors Ua{-,J) and V4(-, j) are not; each vector Ua{-,J) can be multiplied by a complex unit factor e*^^ G S^ and 
Va(-, j) by the same factor, whence the product Va{- , j)UX{- , j) remains unaffected. We can thus define a mapping 

F -.nix (S^)'' -^ U{N, fc) X {cr G R^ (Ti > 0-2 > ■ • . > <Jk > 0} X U{N, k) 

by 

F(A,(e^^O'=i) = (V4diag(e^«0'=i, diag(aA), C/Adiag(e*«0'=i) • 
It is easily verified that this mapping is a diffeomorphism, and hence 

dim7^^ + fc = (2A^fc-fc2) + fc + (2iVfc-fc2), i.e. dim7^^ = 2(2A^fc - fc^) 

We omit a proof of the remaining statements, which can be obtained by standard matrix theory and differential geom- 
etry. 

Given a matrix A G Mjv,jv, the closest point in TZM.k is given by the Eckart- Young theorem, and it is unique 
whenever the singular values are distinct. By the above theorem, it is very improbable that this would not be the 
case for an arbitrary matrix A. Indeed, when working with "real numerical" data this never happens, so we will for 
simplicity treat the projection onto TlM.k as a well defined map which we denote by tt-jin k ■ ^ more stringent approach 
would be to work with "point to set"-maps, as in fS] and ||3T1 . 

Infinite HanJiel matrices of finite rank 

To understand the structure of Hankel matrices, it seems indispensable to consider infinte Hankel matrices, by which 
we mean complex-valued functions on N x N, where N = {0, 1,2,.. .} (in this section we include in the index 
set for convenience). For a complex valued funktion / on N, we denote by H f the infinite Hankel matrix with 
Hf{j^ I) = f{j + I)- This means that H is an operator from C'^ to C'^'. 

The rank of an infinite matrix is the dimension of its column space (the linear space generated by its columns). 

Assume that A = Hf is an infinite Hankel matrix, such that some column is a (complex) linear combination of 
the preceding ones (i.e. rank A < oo). Lat A{-, r) be the fist one of these. It thus holds 



A(j,r)+^A,AO-,0=0 



;=o 



for all j e N (where Aq, . . . , A^-i are complex numbers), which means that 



fij+r)+J2>^,.f{j+l) = 0, 



1=0 



I.e. 



r-l 



/(fc) + ^Az/(fc-r + = 0, k>r. 



(11) 



;=o 



Vi find that every column, starting with A{-, r), is a linear combination (with the same coefficients) of the r preceding 
columns, and we conclude that r is the rank of the matrix. 

Theorem 3 Let A be an infinite Hankel matrix of rank r < oo. Then 



A{0,0) A{0,1) 

A{1,0) A{1,1) 

A{r~ 1,0) yl(r-Ll) 



A(0,r-1) 
A(l,r-1) 

A{r-l,r-l) 



^0. 



Proof: Assume that the determinant vanishes. We have seen that every column, starting with A{-,r) is a linear 
combination (with the same coefficients) of the r preceding ones, and in the same way we see that the corresponding 
relation holds for the rows. It now follows that 



AijuO) AiJul) 
A{j2,0) A{j2,l) 

A(>,0) AiJrA) 



AiJur-1) 
AU2,r-l) 

AiJr.r^l) 



whenever < ji < J2 < ■ ■ ■ < jr, since every row in this determinant is a linear kombination of the linearly 
dependent rows ^(0, •), A{1, •),..., A{r — 1, •). This means that the first r columns of A are linearly dependent, 
contrary to the observations made above. 

We now study the generating function for /: 



F(x) = ^/(fc)a 



k=0 



Using ( fTTI ). we get 



Fix) 



/(O) + E;:; fik) + EL".-. A,/(fc -r + l) 



(12) 



1 + ELi K-kx'' 

In this quotient, the degree of the numerator is at most r — 1. If Ao 7^ 0, the degree of the denominator is r, and there 
is an expansion 

p m„ — 1 

where rrii + m2 + . . . + nip = r, and a^_^ are constants with a^.,„,^ 7^ 0. Hence 



p m,y — 1 00 

EE^^..E 



flt/.M a'^ c,^ '^ ^ Y^ y^ Qt/.M " y^ ^fc-M^ 
u! dxt" 1 - Cx 2^ 2^ n\ dxi^ ^ " 

i/=l ^=0 ^ ^'^ v=\ p=0 ^ fc=0 



v=\ /j=0 



k=^ 



p m„ — 1 



iC.x) 



k — fi 



E E «-'mE 



i/=i ^=0 



A:=0 



k + /i 



(C.a;) 






k=Oi'=l M=0 



We find that 



/(fc) = EQ.(A:)C 



where the Q^ are polynomials of degree irii, — 1, u = 1,2,. . . ,p. 

If Ao = 0, the numerator in (fT2] l is of degree r — 1, because otherwise the columns A{-, r — 1) would be a linear 
combination of the preceding ones, contrary to our choice of r. In this case we let d be the first number with A^ ^ 0, 
and a polynomial division yields 



p rn^ — 1 



F{.)^Q{x) + Y: E (1 _ Zy^i ^ degQix)^d-l 



u=l ^=0 



where toi + 7712 + . . . + nip = r — d, and the a^^^ are constants with a^.m^ ^ 0. 
If we define 6{j) as when j ^ and 1 when j ~ 0, we can write 

Theorem 4 Lef A = H f be an infinite Hankel matrix of finite rank r. Then 

d-l p 

/(fc)-E^M'^(^-/^)+EQ-wc'' 

where Cd-i 7^ (in case d > 1), Q^ are polynomials with dcg Qu ~ rrii, — 1, and d + mi + 7712 
We can also write 



+ . . . + m„ ^ r. 



d—l p m„ — 1 

/(.? + = E ^A'-^C? + ' - m) + E E 9.,pO')C^^C', degq^.f, = m, - 1 - /i. 

/.i— 7y— 1 /I— 

We will now investigate how the nodes d^j^ can be deterniined. We put 

p r 



P(x)=a;''JJ(x-C.)"''-E^'^' 



v=l 



1=0 



Then, for u ^ 1,2, ... ,p. 



i.e. 

r 

Y.^il'^Cl^O, M = 0,l,...,m,-1. 
It also holds A; = if ^ < rf. Hence, for ^ = 0, 1, 2, . . . , r, 

r d—1 r p rrijj — l r 

E /(j' + ^)^' = E '^A. E -^(j' + ^ - ^)^' + E E '?-'A<(j')^^ E ^'^'^^' = 

Now define, for any nonnegative integer k, the upper left corner submatrix of order fc + 1 by 



Ak = 



/ ^(0,0) A{Q,1) ... A{Q,k) \ 
A{1,Q) A{1,1) ... A{l,k) 

\ A{k,Q) A{k,l) ... A{k,k) J 



We know that dct Ar-i 7^ and det Ar = 0. Hence the kernel for A,- is one-dimensional, and we have characterized 
it: It is generated by the vector (Aq, Ai , . . . , A^), where 



r p 



]xix'^x''l[{x~cy 



1=0 v=l 



We now observe that the numbers Aq, Ai, . . . , A^ are exactly the numbers appearing in ( fTTT i (with A,. = 1), and using 
that recursion equation, it is easily seen that for k > r, the (fc + 1 — r) -dimensional kernel of A^ is generated 
by the vectors (0, . . . , 0, Aq, Ai, . . . , A^-i, A^, 0, . . . , 0). The coordinates of these vectors are the coefficients in the 
polynomials x'-P{x), / = 0, 1, . . . , fc — r. We now summarize the observations made: 

Proposition 4 Let A = H f be an infinite Hankel matrix of rank r < 00. Then 



d-l p 

f{k) = Y, c^s{k - m) + E Q-(fc)c^ 

where d > 0, Cd~i 7^ (in case d > 1), Qi, are polynomials with deg Qu = rrii, — 1 and d + mi + 7712 + . . • + 
rUp = r. 

Ifk > r, the vector {pq, p,i, ... ^ p,k) belongs to the kernel of Ak if and only if there is a polynomial Q of degree 
at most k — r, such that 



We call the polynomial 






P(x)=x'^n(:r-C.r 



the central polynomial for A. 

Finite Hankel matrices 

For an infinite Hankel matrix of finite rank r, we have seen that the upper left corner matrix of order r is non- 
singular. For finite Hankel matrices, this will not always be the case. Let A be a Hankel matrix of size N x N, i.e a 
complex.valued function on {{j, I) € N"^; < j < N - 1, < I < N - 1}, such that A{j, I) = A{j' , I') whenever 
j + I = j' + I'. Then there are infinitely many fucnctions / on N, such that Ajj = f{j + I). Such a function / is 
determined hy A = H f only on the set {0, 1, . . . , IN — 2}. Vi will now discuss "canonical" extensions of A to N^. 

Theorem 5 Ldt A = H f be a N x N Hankel matrix of rank r < N and assume that its upper left corner submatrix 
Ar of order r is non-singular Then there are uniquely determined constants Ao, Ai, . . . , A^-i, such that 

r 

f{k) + Y\if{k^r + l) = {), k = r,r + l,...,2N~2. (14) 

1=0 

Proof: We have 

f{k) ^A{0,k), 0< fc<r-l 

and, since the column A{-,r) is a linear combination of the linearly independent columns A{-, I), I = 0,1, . . . ,r — 1, 
there are uniquely determined constants Aq, Ai, . . . , A^-i, such that 

r— 1 r— 1 

fU + r) = A{j, r) = - E XiA{j, = - E ^'Z^-?' + ^)' < ^' < ^ - 1- 
;=o 1=0 



The relation ( fT4l t is thus vaHd for k < N — 1 ^ r. Consequently, for A: = r + l,r + 2,...,A/^— 1, 



r—l r—1 

MJ, k) = fij + fc) = - ^ XifU + k-r + l) = -Y, ^iMJ, k-r + l), j = 0, 1, . . . , r - 1, 

(=0 (=0 

FoUowingly the same relation holds for j = r, . . . ,N ~ 1, and thus the recursion formula (fT4] l holds for k = r,r + 
l,...,2iV-2. 

A function /, satisfying ( fT4l i. has of course a unique extension to a function on N, satisfying the same relation. We 
conclude that if the condition on the upper left corner submatrix is fulfilled, then A has a canonical rank-preserving 
extension to an infinite Hankel matrix. If not, any extension to an infinite Hankel matrix is necessarily of a strictly 
higher rank. The first case is of course generic, and the latter case is exceptional. We will limit our attention to the 
generic case. 

Definition 1 A matrix A G T-LnjT '■= T^N,r H "Hjv belongs to the class T-L^ ^ if 

1. The upper left corner submatrix of order r is non-singular, 

2. In the central polynomial P {x) = 2;'^ J^^^]^(a; — Ci^)™'-', we have d = Qandm^ = Iforallv (and, consequently, 
p = r). 

Tlieorem 6 

JIn is a real 2(2N — \)- dimensional linear subspace ofyiN,N- 

'HJjif ^ is a real differentiable manifold of dimension 4r which is dense in T-Ln.t- Its complement 'HN,k \ 'K^ ^ is 
thin relatively to "H^ ^. 

The map tttj^ ^ is well defined at all points ofH^ ^. 

Proof: The first statement is obvious. For the second, it is easily seen that the complex numbers f{0),f{l),...,f{r — 
1), Ao, Ai, . . . , Ar-i in (fT4l i serve as complex coordinates on H^ ,., and that the exceptional points (corresponding to 
matrices not in W^ ^) are given by restrictions, confining them to a thin set. The third statement is immediate by the 
Eckart- Young theorem. 

3.2 Extracting frequencies from low rank Hankel matrices 

We note that the second statement in Theorem|6]can be seen as a finite-dimensional version of Kronecker's theorem. 
We will exploit it in order to approximate functions by sums of k exponentials; 

k 

/(0 = I]cpe«''', CpXpeC. (15) 

p=i 

We choose some positive weight w that gives rise to a weight a; through d?). The problem of approximating / by a 
sum of k exponentials in £" is then according to Proposition|2]equivalent to finding the matrix Hfopt G TlN,k H Hn 
that minimizes \\Hg — -ff/Hui- 

Let us turn our focus to how to find Cp and (p in ( fT5] l given Hf E W^ j.. If u = (mq, mi, . . . , Uk) is a vector in 
C'^+i, we define the polynomial P^, generated by u, by 



Pu{x)=Y^ 



k 

UiX-' . 



3=0 



From Proposition|4]it follows that the nodes e''^ in (flSl l are precisely the zeroes of the central polynomial P{x) for H f, 
and this polynomial is the last common divisor of all the polynomials generated by vectors in the nullspace of H f. 
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Alternatively, it is the polynomial generated by a single vector, generating the nullspace of {Hf)k+i- This approach 
is relatively fast (time 0{k^)), but it does not have good numerical stability. The reason for this is that we use only 
local data, i.e only k + 1 elements from each con-eigenvector «„. 

A better method is to observe that if / has the form ( fT5] ), then, due to ([3]), the con-eigenvectors ui,U2, ■ ■ ■ ,Uk of 
Hf span the same subspace of C^ as the vectors 

Z, = (l,e^e2Cs...,e(^-iK0, / = l,2,...,fc. 

Let t/ = (ui ... Uk) e Mnm- We then have U = ZG, where Z = {Z{-, l),Z{-,2), ..., Z{-,k) € MN,k is the 
Vandermonde matrix generated by e^p {Z{l,p) = e^p') and G is an invertible matrix in M.k,k- For any matrix A, we 
denote by A(j) the matrix that appears when the j-th row of A is removed. Clearly, we have that C/(i) = ^{i)^ ^^d 

[/(jv) = Z(^j^^G. We also note that 

Z(i) = Z(jv)diag(e^S...,e'^'=). 

Recall that f7(Ar) has a natural left inverse given by UJj^^ = (C/(*^-,C/(7v))^^C^*- From the relations above, it follows 
that 

Now 

(C/*^)C/(jv))~ U*^^Z(^N)G = (C^(W)^(JV))~ U*^^U(^N) = 4, 

and thus 

t^wt/(i) = G"'diag(e';S...,e';'=)G. 

Hence we can compute the nodes Cp by computing the eigenvalues of UJj^M(^iy This method is numerically stable 
and can be computed in 0{Nk'^ + k^) time. 

Once the nodes Cm are found, the problem of find Cm becomes linear, and again it will be sufficient to consider k 
consecutive elements solve the corresponding linear system. 

4 Alternating projections 

Given / S ^2JV-i' the problem of finding the best approximation in ^2Ar-i of the form fopt{l) = J2j=i '^i^''^' i^ hard. 
Instead, our aim is to find an fopt that is close to optimal. We will do this by employing alternating projections. By 
Proposition[2]we know that this problem is equivalent to 

argmin \\Hf - Hg\\^,. (16) 

By starting with Hfo = Hf and alternatively projecting onto the subsets TZN,k and Hn, the idea is that the so 
arising sequence Hf,n will converge to an intersection point Hfoo £ "Hw.fc = T^N,k H Hn, and moreover that Hfao 
is in fact close to the optimal one, H fopt- This idea was investigated in a general framework in f2\. The main result of 
||2l roughly says that the above scheme indeed works if we start not too far away from 'Hat and avoid the thin set of bad 
points related to TZN,k and T-Ln, (which in practice does not seem to be an issue). As an example we studied the case 
of projections between rank k matrices and Hankel matrices in non-weighted spaces. In this paper we make a more 
thorough study of this particular application and extend it to weighted spaces. Moreover, we discuss how to use the 
weighted spaces for approximating functions by sums of Gaussians,we discuss how to construct fast implementations 
of this idea, and finally we will also prove that the framework of f2\ indeed applies. 

We now state the main result of [2] in the current framework. Let Ptin k^ Ph and Ph^ denote the maps taking 
a given matrix B onto the closest point in the respective manifolds. Already here we hit some technical issues. We 
clearly have a formula for P-Hn since Htv is linear, so P-Hk is an orthogonal projection and an explicit formula is given 
by (O. Concerning P-jin k we do have a formula for computing it, but the drawback is that if B has singular values 
of higher multiplicity, then the map is not well defined. This is a common feature in algorithmic frameworks, and 
can be dealt with by introducing point-to-set maps, following ||3T| . However, this seems over-ambitious in the current 
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framework, since matrices with singular values of multiplicity > 1 constitute a thin set (Theorem |6]l, and arbitrary 
small (numerical) perturbation yields distinct singular values. Moreover, in [|2| we prove that P«„ is well defined near 
"regular non-tangential" points of T-Ln, and we will prove in Appendix [10] that the complement of such points is thin 
as well. With this in mind, we will from now on treat P-r,^ ^ and Pun ^s well defined maps. Note that there is no 
simple way of computing Phn- We will prove in Appendix [TOl that the theory developed in Q applies in the present 
setting. Combined with this, the Theorem 6.1 of Q reads; 

Theorem 7 For all A G T-Ljq outdside a thin subset, the following is true. Given any e > 0, there exists an s > such 
that, for all Hf with \\H f — A\\ < s, the sequence of alternating projections given by Bq = H f and 

R _ f PuNkiBj) j is even 
■'+1 " \ Pn{Bj) j is odd ^ ' 

(i) converges to a point H f^o G "H^ j, 

(n) \\Hfoo - HfoptW < e\\Hf - HfoptW 

A few remarks: (i) combined with Theorem|6]says that we will achieve an approximation of / of the form /oo(0 = 
J2i=i ^j^'^^^ ■ Moreover, note that if we had on the right hand side of (ii), then f^o = fopt- (m) says that the error 
1 1 foo ~ fopt II 2 can be made arbitrarily small relative to the distance 1 1 / — fopt \\ ^ ■ Finally, the full theorem in (2\ has a 
third post, but to define this we need to discuss angles between manifolds, which we like to avoid. Basically, the third 
post says that there exists a number < c < 1, whose lower bound is related to the angle between TInm and Hn at 
A, such that 

(^^^) ||iJ/oo - B,|| < c'\\Hf - HfoptW- 

For practical purposes, this is an important observation, since it says that the algorithm has so called c-linear conver- 
gence. 

Let us now briefly discuss what happens if we are not close enough to Hn for the above theorem to apply. First 
of all, we have never encountered a situation where the algorithm does not converge. Secondly, it is easy to see that 
both Ph„ and P-Rn k '^^^ contractions, so {Bj)^^ is a bounded sequence. It thus has a convergent subsequence by 
basic properties of compact sets. Moreover, it is easy to see that the distance ||-B;+i — Bi\\ is strictly decreasing with 
, and hence the limit point of the convergent subsequence is in T-Ln- (However, there is of course no indication that 
the corresponding f^o is at all close to fopt, so this observation has limit value.) In literature treating similar topics as 
in this article, one is usually content with concluding that the algorithm in question has the property that it generates 
a sequence with a convergent subsequence having a limit point in the desired set, and attributes this to Zangwill's 
theorem, ||3T1 . Clearly, Theorem |7] provides much more information in our setting; every point vi\T-Ln, outside some 
thin subset, has a neighborhood such that, if any Bj enters that neighborhood, the sequence [Bj)"^^ will converge. 
Since the sequence necessarily has more than one accumulation point if it does not converge, the only possibility for 
divergence is that (-Bj)°^q wanders back and forth along the valleys of the thin pathological set, between the hills 
constituting the open set formed by all nice neighborhoods mentioned above. This seems highly unlikely, but we leave 
it as an open question to rule out this possibility. Clearly, it would be interesting to have some concrete values of the 
parameters e and s in Theorem|7] We will return to this issue in what follows. 

Below is an algorithm that specifically describes how to apply the alternating projection scheme in our case. 

Algorithm 1 

1. Let fo = f,l = 

2. (Application of P-Rj^ ^} Compute the first I con- eigenvalues Sm ond the con-eigenvectors Um of H fi using 
Theorem^ The projection Pn^ k^fi '■* ^'^^'^ given 

k 

y^ SmyhTiU*^. (18) 

m=0 
12 



/^+i = ^ME 




3. (Application of Puj^) Compute 



4. Increase I and repeat from (2). 

5 The root-MUSIC and ESPRIT methods 

We briefly recapitulate the two most widely used methods for "high accuracy" frequency estimation. Our description 
will follow the implementation given in IZTl . 

In a previous section we noted that we can find the nodes for a function / of the form ( fTsT i, by considering the 
null space of a Hankel matrix that is generated from /. Recall that it was sufficient to consider a submatrix of size 
(k + l) X (fc+1) to accomplish this. Thenodescaninprinciplebefoundby finding the roots of the central polynomial, 
which is the polynomial generated by a the vector generating kcr iJ(i.+i) . However, just as discussed previously, this 
would lead to numerical instabilities, even when / is a pure a sum of k exponentials. From Theorem |5] it is easily 
seen that we can find the nodes by considering a singular value decomposition of a rectangular Hankel matrix, also 
generated from /. Let H^ f G M.2n~m,m, with Af > k, be such a Hankel matrix, and suppose that ( fTsT i holds. Then 
the nodes can in principle be found by finding the roots of any polynomial generated by a u G ker H^ f. Such a u is 
also in the kernel of 

{H\fnH\f){j,k)^Rf{j,k) (19) 

where < j,k < M. The matrix Rf is sometimes referred to as the sample covariance matrix. It may seem to 
be beneficial to work with Rf instead of with the full Hankel matrices, since it is in principle possible to choose M 
much smaller than N It appears tractable that we make eigenvalue decomposition on a smaller matrix, and that the 
root finding step is also done with smaller matrices. The standard implementations of root-MUSIC and ESPRIT in 
||27l work on for instance on Rf rather than Hf. However, just as discussed previously, a too small M can lead 
to numerical instabilities, even when / is purely a sum of k exponentials. Moreover, the matrix Rf needs to be 
computed. It is not hard to see that this can be achieved in 0{N log N + M^) time by splitting H^ into two parts and 
employing FFT. For large M this is not particularly advantageous. Another drawback is the loss of precision when 
forming {H-fY{H-f). 

The discussion so far has been conducted under the assumption that (flSl l is valid. In the typical situation this is not 
quite true; the standard assumption is that / contains additive noise as well. Alternatively, we could be interested in 
the compression problem of representing a function using only frequencies and coefficients, in which the additive part 
has more structure than white noise. 

Let 

Hrf = VT.U*. 

We will as before denote the columns of C/ by ui, . . . , um- In the noiseless case, we did see that we had a great deal 
of flexibility, as any u„j, k < m < M could be selected to find the nodes. The root-MUSIC method exploits this 
property, and tries to use all of the vectors Um, k < m < M to reduce the influence of noise. In the root-MUSIC 
method, roots are found by solving 

M 

Pmvsic{z)= Y. Pu^i^)P—{z) ^ 0, 

m—k-\-l 

where " reverses the order of the elements in a vector Loosely speaking, this choice is motivated by the facts that the 
roots will appear in pairs when / is a linear combination of purely oscillatory exponentials. There will be 2M — 2 
roots to Pmusic(2:) = 0. The pairs associate with the true nodes, will have Rc(C) ~ 0, with one slightly larger than 
zero and one sHghtly smaller. For a more detailed justification on the choice of PmusiCj cf. lIZTl . 

In the general case, where there is no constraint on the nodes (p, k roots need to be selected out of the 2M — 2 
that are given from Pmusic (^) = 0. In the simulations performed in the later sections, we have used the MUSIC code 
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provided in 1271 . and added a selection step where we approximate / using all 2M — 2 nodes using a least squares 
approach, and then selecting the k nodes with largest coefficients. It appears unnecessary to compute nodes that have 
to be neglected. 

The ESPRIT method avoids the step of computing unnecessary nodes. Instead, a similar approach as in Section|3] 
is used. For the noiseless case, it is readily verified that the eigenvalues of 

iU{M)U{M)y (t/(A/)t^(l)) 

will coincide with the eigenvalues of e^™, m = 1, . . . , fc. In the ESPRIT method the eigenvalues of the expression 
above are used to compute nodes also in the case where noise is present, cf. Il23l 

We end this by section by a few remarks about the connection to autocorrelation and Toeplitz matrices. For a 
function of the form (fTSl l where the exponentials are purely harmonic (zero real part of (p), it holds that 

lim ^Rf 

is the self-adjoint Toeplitz matrix generated by the autocorrelation of / (where / is the AN + 1-point sampling of a 
fixed function on a fixed interval). According to a Theorem by Caratheodory |[28l, if the self-adjoint Toeplitz matrix 
generated by a function has rank k, then that function can be expressed as a sum of k purely oscillatory exponentials. 
This motivates alternating projection schemes between the manifolds of Toeplitz matrices and low rank matrices, 
for the approximation of the autocorrelation of a function. However, the effect of a finite sample length can not be 
neglected, and the Toeplitz matrix generated from the autocorrelation of a pure sum of k oscillatory exponentials, will 
fail to have rank k. For that sake, the approach we have chosen seems preferable. 

6 Fast algorithms 

There are two operations for which we will need fast numerical methods in the alternating projections approach for 
frequency detection; (low rank) Takagi decomposition, and the application of the averaging operator ( fTsT l. It turns 
out that both operations can be implemented in a fast manner, but the first one will require some more effort than the 
second. 

Proposition 5 The application of a Hankel matrix to a vector can be done in 0{N log(iV)) titne by means ofFFT. 

Proof: This is a standard result ifTTI . and makes use of the fact that circular matrixes are diagonalized by the discrete 
Fourier transform, and that it easy to construct a circular matrix from a Hankel matrix by permutation and periodic 
extension. The 0{N log(iV)) time complexity can then be achieved by employing FFT. 

Proposition 6 The weighted averaging operator H* in U8\l can be applied to a rank 1 matrix in 0{N log(iV)) time. 

Proof: By definition 

H*{uu^){l) = 4t E ^{jXj)n{k)w{k), 
^^ ' j+k=i 

^^ T. "OXfc)' 2 < Z < 27V, 
^^ ' j+k=i 

where v = wu. It is easy to see that the sum above can we written as a discrete convolution {v{j)v{l — j)) using zero 
padding to avoid boundary effects. The discrete convolutions can then be computed in 0{N \og{N)) time using FFT. 
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6.1 Lanczos method for complex symmetric matrices 

We will use a modified Lanczos method for finding the k first con-eigenvalues/con-eigenvectors. The Lanczos method 
is a way to perform a unitary transformation of a Hermitian matrix to tridiagonal form, i.e., given A ~ A*, compute 
T — Q*AQ, where Q is unitary and T = T* is tridiagonal. We need a similar decomposition for complex symmetric 
matrices. The usage of a modified Lanczos method has been addressed in lfT6l l29l l26l . As we only need to compute 
the first k con-eigenvalues/con-eigenvectors, we develop a method customized to that purpose. 

The basic step in the Lanczos method is simple. However, it is notorious for the loss of precision, sometimes in 
a counterintuitive way. This issue must be addressed carefully. The columns in the unitary matrix Q are computed 
sequentially, in such a way that each new column is automatically orthogonal to all previous ones. In practice, finite 
numerical precision can ruin the orthogonality, and it can be completely lost in within just a few steps. Two meth- 
ods that address this are selective orthogonalization |fT9l and partial orthogonalization ll25l . We will make use of 
ingredients from both these methods in our particular setup. 

For a given symmetric matrix A E Mtvjv, we look for a unitary matrix Q, complex numbers ai, a2, ■ ■ ■ and 
nonnegative real numbers /3i , /32, • ■ •, such that 

T = QAQ*, (20) 



where 



fa. 


/3i 


. 








Pi 


a2 


/32 . 











/32 


as . 














. 


• aN-i 


Pn^i 


V 





. 


■ Pn-i 


OiN 



\ 



(21) 



with Pj > 0. The matrices Q and T can be constructed as follows: We want to achieve AQ = QT, which means that 

Aqi = aiqi+ I3iq^ 



and, for 2 < j < m — 1, 
and finally 



Mj == Pj-ilj-i + Oijqj + Pjqj+i 



AqN = PN-iqN-i + aNqN, 
where gi , 52 ,•••,<? at are the columns of Q. We choose a unit vector qi and define 

ai ^ {Aqi,q^) = qjAqi, (3i = \\Aqi - aiqT\\, q2 = — {Aqi - aTqi) 



if f3i 7^ 0, and then, recursively 



aj = iAqj,qj), 
P] = WMi - ft-i9PT- aj-g^ll, 



(22) 



as long as fij 7^ 0. One readily verifies, by induction, that the vectors qj are orthonormal. If, at some step before the 
last one, /3„i = 0, then the subspace X = span((ji , 52 , ■ • ■ , qm) of C^ has the property 

For a complete factorization, we can then choose a unit vector in the orthogonal complement in X and proceed. It is 
easily verified that Aq £ X-^ if q E X^, so the procedure will eventually yield an orthonormal basis qi,q2, . . . ,qN 
for C^, having the desired property. However, as will be discussed in what follows, we will be content with a partial 
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decomposition, and the vanishing of /3„i at some step generically impUes that we do not need to proceed further. Now 
let Qm € M7v,m consist of the first m columns of Q and let T„i be the upper left corner m x ?7i-submatrix of T. Write 
Qm = {qi , ■ ■ ■ 1 Qm) and let T!,„ denote the m x m upper left corner submatrix of T. By standard arguments one sees 
that the con-eigenvectors of Am converge to those of A, and moreover that for con-eigenvalues with a low subindex, 
this convergence is obtained (within certain precision) with high probability (depending on xq) for m « N . 

The immediate application of the modified Lanczos-method outlined above is that we can compute con-eigenvector 
and con-eigenvalues for T instead of for A (cf. discussion about con-similarity lfT2l p244, p251]), which due to the 
tridiagonal structure of T it is beneficial. Moreover, since in our setting we are only interested in the first k con- 
eigenvectors and A; < < A^, it suffices to work with T„j for a relatively low number of m, increasing the computational 
speed. The following lemma makes precise the claim that the con-eigenvalues of T^ converge to those of A. 

Lemma 1 Let A ~ A^ be given and let r,„ be as above. Denote the con-eigenvectors ofTm by Uj and the corre- 
sponding con-eigenvalues by fij, 1 < j < m. Then, for each j, there is a con-eigenvalue Xj of A such that 

I Mj - Aj I < Prn \ujiin)\. (23) 

Proof: Set e„i = (0, . . . , 1) G R™ and note that AQm - 'Q^Tm = ^,„(7,„+ie^ by ^^. We apply this to Uj to get 

\\AQmUj - Q^TynUjW = \\AQ,nUj - HjQrnUjW = || ^„igm+ie^Mj |1 = l3m\Uj{m)\, 

and introduce Wj — QmUj. Since ||uj|| = 1, it follows that \\wj\\ = 1. Denote the con-eigenvectors of A by vi, and 
represent Wj ~ J^i ^I'^i' J2i I'^'P = 1- ^^ ^^^^ ^^^^^ 



\AQjnUj ~ QmTmUjW^ = \\Awj - HjWjW'^ = jl ^ Sl{Xl - fi-j 



Pj)Vl\ 



> mm\Xi - /ijp Y^ |s/|2 = mill |A/ - 



This is a well known result for the case of Hermitian symmetry, see for instance ||2()1 p. 69]. A similar result is given 
in 1261 Proposition 2.2]. 

Lemma [U provides a way to control the convergence of con-eigenvectors. When the quantities in ( |23] ) are small, 
then Wj will be a good approximation of the con-eigenvector to A that is associated with A^ . In many cases, con- 
vergence for the first con-eigenvalues are reached for comparatively small m. In particular, for the case where the 
(con)spectrum of A has a large gap after, say k terms, it is typically only necessary to use m slightly larger than k. 
This will be the case for all but the first step in our alternating projection algorithm. 

As mentioned before, in a straightforward Lanczos implementation the orthogonality of Q will quickly be lost due 
to finite precision arithmetics. Moreover, and somewhat counterintuitively, the loss of orthogonality will grow as the 
con-eigenvectors converge, cf. |fT9l . The simple remedy to this problem is to reorthogonalize qm+i to all previous 
Qj at each iteration. However, this increases the algorithmic complexity of the method. Instead, we want to have a 
criterion on when reorthogonalization is needed. The loss of orthogonalization is also indicative of con-eigenvalue 
convergence. 

Two suggestions on reorthogonalization criteria are given in 1251 [191 . We will follow the approach given in l25ll . 
Since we are working with con-eigenvalues and con-eigenvectors instead eigenvalues and eigenvectors, we briefly 
provide the details. 

Due to finite precision arithmetic, we model ( l22b as 



An^m+l = Aq,n - g^arn " g„_i/3,„_i + Cm, (24) 

where e^ describes the error introduced by the finite precision. We now let qj denote the vectors computed from the 
relation (l22l i. Due to the errors £,„, these vectors will not be orthogonal. Let tuj^k ~ q^qk- Then uij^k will satisfy the 
recursion relation 

Wm+l,m+l = 1, l^m+l.m = Qm+l1m — VAn+l, (25) 

+ l.,j — -3 — {'^j^mj + /3j-lWm,j-l + I3j^m,j + 1 — Oim^mJ ~ Pm-l^m-lJ j + ^m,jy 

Pm ^ '' 



Pi 
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where "drnj = P^iqJ^m — Qm^j)- The last equaUty follows from multiplying ( |24] | by qj, and subtracting the same 
quantity with the indices j and rn interchanged. Since A = A^ the quantity qjAqm then cancels. 

Using the recursion formula above, we can monitor the level of lost orthogonality without explicitly having to 
compute inner products of the columns of Q. In analogy with the empirical results in 125111911 . we simulate the error 
quantities as 

^™,,gN (0,0.3 £(/?„+ /?,)), 

V',„+ieN('o,0.6e(2iV + l)|^ 

where N(0, cr) denotes the complex normal distribution with standard deviation a, zero mean and independent real 
and imaginary parts. Above, e denotes the machine precision. 

The maximum loss of precision that can be tolerated without loss of precision in the coefficients a„,+i and Pm 
is y/e. Once some cum+ij exceeds that level, it is necessary to reorthogonalize. As seen from (|25l l, each Um+ij 
is strongly influenced by its neighbors. Hence, it will not be efficient to only reorthogonalize against the vectors qj 
where 0Jm+i,j, since for isolated j's the orthogonalization would immediately get lost in the next iteration. Instead, 
it is beneficial to reorthogonalize against a batch of qj's. Hence (and in accordance with 1251) we reorthogonalize 
against the set of qj which have \uj.,n+i,j \ > f "^^'^ once \iiJm+i.j \ > \/e for some j. 

After a reorthogonalization has taken place, we need to reset the quantities cum+ij- Again following l25ll . we 
choose w„j+i J- e N(0, 1.5e). 

The final ingredient is a rule for when to utilize Lemma[T]for convergence monitoring of con-eigenvalues. Clearly 
771 > A: in order to find k con-eigenvalues that have converged. Since the convergence of con-eigenvalues and the 
loss of orthogonality are coupled, we compute a Takagi factorization of T„i, once loss of orthogonality is indicated 
by |w,„-)_ij| > ^/e for some j, given that m < k. Moreover, we can monitor the behavior of /3,„ to check for 
convergence. If /3m becomes very small for some m, then it means that Qm defines an almost invariant (con)subspace 
under A, which implies convergence of the (non-zero) con-eigenvalues. We let el denote the desired resolution of 
con-eigenvalues, and impose the convergence criterion 

As always with numerical implementations, it can be difficult to determine how small j3„i has to be in order to consider 
it to have almost vanished, i.e., if el is chosen very small. A typical feature of this case is that the last value /3 jumps 
dramatically in size. This behavior also serves as a good criterion for when to check for convergence by means of 
Lemma [H 

In the procedure above, we need to compute the Takagi factorization of r,„. The cost of that step when using 
Proposition [T] is 0(771"^). However, due to the tridiagonal structure there are methods to compute this in 0{m^) time, 
cf. IIT6I [30I |29l . These methods are based on straightforward modifications of methods for eigenvalue decomposition 
of tridiagonal Hermitian matrices. 

The most expensive step in the Lanczos procedure described above is the matrix vector multiplication Aq^ in ( l22l l. 
However, this step can be computed in 0{N log N) time by Proposition|5] 

Proposition 7 The time complexity for computing the first k con-eigenvectors and con-eigenvalues of a Hankel matrix 
to accuracy E using the modified Lanczos method described above is 0{mN log N + m^), where m denotes the total 
number Lanczos steps, and where m > k, but where m is typically of the same order as k. 



7 Numerical simulations 
7.1 Performance analysis 

In this section we compare the performance of our approach against the ESPRIT and root-MUSIC methods. We 
simulate functions of the form / = /o + n, where 



/o(0 = 5I^P' 



Xpi 




and where n is a noise component. The coefficients Cp are chosen as complex normal distributed variables, and the 
nodes as Cp = 1/(47V + 1)(50Z^ + iZ'^), where Z^ and Z* are normally distributed. 

The noise component is constructed by letting n{k) = nr{k) + ini{k), where n,. and n,; are normally distributed 
noise, and where 



10-SNR/lO^^ 



for some signal to noise parameter SNR. By this construction, the signal to noise ratio will be exactly equal to the 
parameter 5A^i? when measured in dB. Throughout the tests, we have chosen to work with a signal length of 511, i.e. 
N = 256. This is chosen to make the FFT routines run fast. All simulations have been run in a MATLAB environment, 
without any compiled optimizations. For the ESPRIT and root-MUSIC, we have used the routines provided in 1271 . 
with minor modifications to make them work for the case Rc(Cp) 7^ 0. The accuracy parameter e used in the alternating 
projection method has been chosen to be a factor 100 lower than the noise magnitude. 

In figure [T] we show some simulation results for the different methods. We conduct a small number of simulations 
for SNR = 10 dB and k = 10, and consider the performance in terms of the errors generated by the different 
methods. We display the errors are displayed in two ways; in relation to the pure signal / and in relation to the noise 
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Figure 1: The solid curves show the error \\g — /||/||/|| with g obtained with MUSIC (green), ESPRIT (red) and our 
propose method (blue), respectively, for SNR=10. The dashed ones show the counterparts for \\g — /o||/||/o||- 

one /o. 

We see that our proposed method systematically has a smaller error in both ways of measurement. We also note 
that for all methods we have a substantially smaller error when compared to the pure signal /o instead of the noisy 
one. Hence, all three methods successfully filter out a large part of the noise. It is also notable how close the error in 
relation to / is to the signal to noise ratio for our proposed method. This is also implied by Theorem|2l Basically, in 
the notation of Section]?] we have g = go, and it is reasonable to assume that /o « gopt- This is because the noise 
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has a high probabihty of being orthogonal to /o, and that Pm-^hu locally acts as an orthogonal projection, (which is 
further elaborated on in ||2l)- Thus, Figure[T]can be interpreted as the upper blue line shows j|/ — gopt|l/||/|l, whereas 
the lower blue line gives an indication of the size of ||goo — 5opt ||/||/||- In terms of Theorem|7]with A = TJ/q of norm 
1 and s — 0.1, this means that we can pick e around 0.1 as well. Although the above images are constructed using 
standard £^-norm, not the weighted one required for Theorem|7]to kick in, it is interesting to observe that this is in line 
with the observations in IJ). There, using more carefully conducted examples to test Theorem|2l it seems that one can 
take s w e when working with /c w 10. 

It is interesting to see how these result depend on the different parameters, i.e., the number of nodes k and the 
noise level SNR. In Figure|2]we have conducted more thorough investigations. For each k = 1, 2, . . . , 30 we have 
done 100 simulations and computed average results. The averaging has been made in dB, in order to limit the effect 
of outlier results. As for Figure [T] we display errors in two ways, using solid lines for errors in comparison to the 
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Figure 2: The left panel shows results for SNR=10, and the right one results for SNR=30. The solid curves show 
the dependence on k of the average error \\g — /||/||/|| with g obtained with MUSIC (green), ESPRIT (red) and our 
propose method (blue), respectively, for SNR=10 . The dashed ones show the counterparts for ||g — /oi|/l|/o||- The 
average for each k is done over 100 simulations. The thin lines show the results obtained for MUSIC and ESPRIT for 
M = 4fc, whereas the thick lines show results for AI = A^. 



noise signal / and dashed lines for the comparison to the original one, /q. We also display some of the impact that the 
choice of size (M) of R in ( fT9l ) has. The thin lines in red and green show errors for M = 4k and the thick lines show 
the counterpart for M = N — 256. 

There are a few interesting conclusions that can be drawn from the results depicted in Figure |2] First, we note 
that for the cases were k is small, all three methods perform comparably well, given that the size (M) of the sample 
covariance matrix used in MUSIC and ESPRIT is sufficiently large. However, as k increases, the alternating projection 
method starts to outperfom the other two. We can again note that the errors (compared to /) produced by the alternating 
projection method almost coincide with the signal to noise ratio. Moreover, in terms of Theorem|2l Figure |2] seems to 
indicate that e « s is a good rule of thumb, although the ratio gets slightly worse as the complexity of the manifold 
A^" n "H increases with increasing k. 

From the results we have seen so far we can conclude that the alternating projection method should be the method of 
choice unless k is very small, given that the prime concern is to minimize the estimation errors. The other criterion for 
method selection is speed. The computational times for the different methods is displayed in Figure |4] As mentioned 
before, the MUSIC and ESPRIT algorithms that are used are slightly modified versions of the ones given in ll27l . In 
Figure |3] the fast alternating projection method is the fastest. The MUSIC and ESPRIT algorithms are substantially 
slower for high M. For the MUSIC algorithm, the most time consuming step is the root solving step. We note, 
however, that by using our fast method for finding the first k con-eigenvector / con-eigenvalues, we can construct a 
method that would have much resemblance with the ESPRIT method as described in 11231. It seems advantageous 
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Figure 3: Average execution time for the different methods in milliseconds. The line notation as in Figure|2]is used. 

to work directly with Hankel matrices rather than the covariance sample matrix of ( fT9] l. Using such an approach, 
we would be able to construct a computational method that would provide similar results as the ESPRIT algorithm 
described in ||271 . but substantially faster than the one based on the sample covariance matrix. From the results in 
Figure|2]we can conclude that the results would not be as good as the ones obtained by the fast alternating projection 
method proposed here. However, it would be faster, as it would only involve one decomposition step. In other words, 
it would be equivalent to using the alternating projection scheme with only one iteration. 

A natural question would then be how much faster such "fast" ESPRIT algorithm would be. A first guess would 
be that the speed ratio would be proportional to the number of alternating projections performed before the target 
accuracy e is reached. It turns out that the fast alternating projection method is faster than that. The reason for this 
is that fewer Lanczos iterations are required in each alternating projection iteration. In Figure|4]we display the ratio 



Figure 4: The ratio between the time for total number of iterations compared to the first one for the alternating 
projection method for 1000 simulations 

between the total time and the time for the first iteration for the alternating projection method. We see that the ratio 
typically lies around 2. This means that the fast alternating projection method would only be about twice as expensive 
as a fast implementation of ESPRIT, while providing smaller errors. Again, we note that the proposed fast alternating 
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projection method is substantially faster than the standard implementation of ESPRIT and MUSIC. 



7.2 Approximations with Gaussians 

As a final example, we show some results concerning the approximation of functions with Gaussians with fixed half- 
width, using the fast alternating projection method with Gaussian weights. There are two possible interesting cases. 
The first one concerns the case where the functions are of the form 



^Cpe~"(^-^ 



+i^pX 



(26) 



with fixed (and known) constant a. The second case concerns the approximation of functions using a Gaussian 
window, for example as done in time-frequency analysis. Using a non-linear approach may be beneficial compared to 
short-time Fourier transform representations with overlapping windows. However, we will in this section only show 
some results concerning ( |26] ). 

In Figure |5] we show the result from one simulation using a function of the form ( |26] |, using 10 Gaussians. In 
order to approximate this function using exponentials, we choose the weights w such that y^ approximates e^"'/^^, 
I = --2N, . . . , 2N. To this end, we choose 



Wk 



iO^_^-4ak/4N k=-N,...,N. 



AN + iy TT 

For sufficiently narrow Gaussians (large a), we will then have that y/uJi w e^"'' ^^. Just as before we let /o be of the 
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Figure 5: In the top left panel the noisy signal / is shown, the top right shows the original fa, the bottom left shows 
the reconstruction, and the bottom right shows errors. The errors are shown unsealed in gray, and scaled with respect 
to y/w in black. 

form ( |26] | and use additive noise to obtain /. One simulation is shown in Figure|5]for SNR = 10. Before we start the 
alternating projection scheme, we divide / pointwise with 1/y^. This will boost the amplitude at the endpoints of / 
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substantially, but since we approximate using w; as a weight, we will obtain a uniform approximation. The noise will, 
however, not be uniform with this approach, but larger at the end-points. 

The result from one simulation is shown in Figure |5] The noise signal is depicted in the top left panel, while the 
original is displayed in the top right panel. In the bottom left we see the obtained reconstruction. We can see that most 
of the features from the original signal is captured. In the bottom right panel we show pointwise errors; in black the 
error weighted with w; and in grey the unweighted pointwise error. 

8 Conclusions 

We have developed a method for the fast estimation of complex frequencies using an alternating projection scheme 
between Hankel matrices and rank k Takagi representations. The method has a time complexity of 0{nN log iV + rt'^). 
FFT routines are used both to get fast matrix-vector multiplications, and to project rank k representations to Hankel 
matrices. In order to compute the first k Takagi vectors, we employ a modified Lanczos scheme for self-adjoint 
matrices. The number of necessary alternating projection steps depends on an accuracy parameter, but in typical 
situations the total time is only twice as large as the time needed for the first iteration. The reason for this is that fewer 
Lanczos steps are needed when the matrix we obtain is closer to being both Hankel, and rank k. 

In our simulations we see that the proposed method performs better both with regards to speed and approximation 
accuracy, compared to standard implementations like root-MUSIC and ESPRIT. We also verify that the errors that we 
obtain behave in the manner theoretically predicted in |2l. The method works also for some weighted representations. 
A particular case of weights that can be used are Gaussian weights, for which case some numerical examples are 
provided. 

9 Acknowledgements 

This work was supported by the Swedish Research Council and the Swedish Foundation for International Cooperation 
in Research and Higher Education. 

10 Appendix; the set of tangential points in Hk is thin 

Following the terminology of [21, a point A E T-Lk is called regular if the dimension of TZk Hk and H are constant 
in a neighborhood of A. Thus theorem |6] says that the set of non-regular points is thin. Moreover, recall that a point 
A G Hfe is called non-tangential if 

TnM)(^TniA)=Tn,nn- (27) 

In order to prove Theorem]?] we need to show that the set of tangential points in Hk is thin, and then apply Theorem 
6.1of||2l. 

Theorem 8 The set of tangential points is thin in T-Lk- 

Proof: By Theorem |6] we immediately get that all points in W^ are regular and that Ti, \ T-Vj^ is thin. To verify that 
A e HI is non-tangential, it thus suffices to establish (Elli, e.g. that Th{A) n TT^d{A) = ^^^(A), since HI C 7^^. 
Clearly 

TH{A)r\TT^.{A)^THi{A). (28) 

By Theorem|6]and the fact that A is regular we have dim(T>^i. {A)) = Ak and 

dim(T„(A) n Tt^.{A)) = dim{Tn{A)) + dim(T^.(.l)) - dim(T«(A) + Tt^.IA)) = 
= 2(2n-l) + 2(2/cn-fc2) - dim(T«(A) + T^d(yl)). 

To establish the reverse inclusion to ( l28b , it thus suffices to show that diin{T-u{A) D T-j^d [A)) < 4fc, or equivalently 

dim(T«(A) + T^^d (A)) > 2(2n - 1) + 2(2fcn - k^) - Ak. 
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Moreover, since both subspaces are closed under multiplication by C, it suffices to verify 

dimciTniA) + Tj^diA)) >2n~l + 2kn-k^- 2k, (29) 

where dime denotes the dimension over C. To this end, note that the map 227 : (M„^fc)^ -> M„_„ given by 



WiU,V)^VU*^^v,u*, 

(where Uj, Vj denote the columns of U and V respectively), is an immersion onto TZk- By this we mean that for each 
AeTZk there exists Ua, Va such that A = W{Ua, Va) and, if ^ G TZf, then 

where dW denotes the derivative of 21T. In this section we define il : C" ^- M„,fc and QJ : C" x C" ^- M„,fc via 



/ 1 



ilia) 



1 \ 



ai 



ak 



QJ(c,a) 



V a? • • • a^ / 
It is easily seen that, given any a e C, the matrix 
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N-l 



a^ 



2W-3 
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N-l 
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^2N-3 ^2N-2 



defines a rank 1 Hankel matrix. Thus 

k 

is a rank k Hankel matrix. It is clear that 

Sj{c,a) =2n(U(a),QJ(c,a)). 
Thus, whenever A = Sj{c,a) £ H^, we have 

T^^A) = Rana2n(il(a),53(c,a)). 



(30) 



(31) 



(32) 



Now, it is not hard to see that dW{ii{a), 2J(c, a)) is a polynomial in the variables c and a. To visualize, say that n = 3 
and k — 2. Then the right hand side is given as the span of the 12 matrices 



(J = l,2) 




and 



c, 
Cj a j 
Cja'j 



CjUj 
Cja^j 



Cj 
CjUj 
Cja] 



(J = l,2). 
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Moreover, picking a basis for M„_„ (for example, the standard one which we order lexicographically), the right hand 
side of ( |32] | can be identified with the range of a matrix with polynomial entries which we denote by dW{ii{a) , 5J(c, a) ) . 
To continue the example, we get 

\ 



dW{- ■ ■ ) 



With 



T-L is spanned hy Ei, . 
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(33) 



. £'2n-i, where the notation is self-explanatory. In our example we get 

/ 1 \ 

10 

10 

10 

H = Ran 10 

10 

10 

10 

\ 1 y 



(34) 



■ ) and Ti. by [d'W Ti] . To verify ( |29l ), it thus suffices to show 

(35) 



Let us denote the matrix obtained by adjoining dW{- 
that 

Rank [dW H]>2n-1 + 2kn - fc^ - 2fc, 

holds, evaluated at (il(a), 93(c, a)) for some c, a such that i5(c, a) £ H^. Note that 

(z) If we can establish ( |35] ) for one point A ~ :P){c, a), then it easily follows that ( |35] | holds at all but a thin set of 
points A. To see this, set g = 2n — 1 + 2kn — k^ — 2k and first note that we can pick a q x q submatrix of 
[dW H] whose determinant is a non-zero polynomial. Thus, by standard algebraic geometry, the set of points 
(c, a) where the determinant is zero is thin in C^*^. Finally, it is also clear that the image of a thin set under a 
chart, in this case S), is again thin. 



(u) Let B = W{Ub, Vb) eUkhea point such that 

Rank dW{UB, Vb) = 2kn - k^ , 



(36) 



but where {Ub, Vb) is not necessarily in the closure of the range of (IX, 23). We claim that in order to establish 
(i), it suffices to establish ( |35] ) at the point {Ub, Vb). To see this, first note that by ( |36] ). TZk is locally a manifold 
(of dimension 2kn — k^) around B, and we can take an affine subspace of J\f C M^^ j. containing {Ub, Vb) such 
that 2If|^ becomes a local chart for TZk- If ( l35] l holds for {Ub, Vb), then arguing as above with determinants, 
it holds in a neighborhood of {Ub, Vb)- By Theorem |6] H^ is dense in Hk, so in particular we can pick a 
C e H^ and corresponding Uc, Vb e TV and cc, ac £ C" such that C = W{Uc, Vc) = Sj{cc, ac) and ^ 
is satisfied for [dW{Uc, Vc)n]-By^ and ^ we have 

Ran dW{Uc,Vc)=TT^^^{C) = Ran dW{!d{ac),'^{cc,ac)), 

which shows that ( |35] | is satisfied at (il(ac'), QJ(cc etc)), as desired. 
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So, it remains to verify ( |35] | and ( |36] | for some point B = '20{Ub , Vb ) G Hk ■ In terms of our example, we pick 

Ub = 
so that B becomes the rank 2 Hankel operator 

B 
Then dW{UB, Vb) is spanned by the 6 "^-derivatives"; 





/ 1 1 


Vb = 


1 
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1 1 \ 




10 




0/ 






and the 6 "[/-derivatives; 




This is clearly an 8-dimensional space not including 




which happens to be £'5 in the basis for H, and thus 

Rank [im{UB, VB)n] =9 = 2^3-1 + 2*2*5- 2'^ -2*2, (37) 

establishing (l35T l in this particular case. The reason for working with this simple example, is that it is easy to generalize 
the idea to arbitrary k,n, but hard to write down and we want to omit the details. Roughly, in the general case the 
"T^-derivatives" will span the first k columns of M„.„, whereas the [/-derivatives will span the first k rows. Thus 
Rank dW{UB, Vb) = 2kn — fc^, as required in (l36T l. Moreover, it is easy to see that {Ei, . . . , E2k} is a subset of 
Ran dW{UB, Vb), whereas {E2k+i, ■ ■ ■ , E2n-i} form a basis for a disjoint subspace, (except for the point zero), see 
Fig [To] In general we thus get 

Rank [dW{UB, Vb), T-h] = 2kn- k^ + 2n - I - 2k, 

as desired. 
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